Total Jensen divergences: Definition, Properties and k-Means++ Clustering

نویسندگان

  • Frank Nielsen
  • Richard Nock
چکیده

We present a novel class of divergences induced by a smooth convex function called total Jensendivergences. Those total Jensen divergences are invariant by construction to rotations, a feature yieldingregularization of ordinary Jensen divergences by a conformal factor. We analyze the relationships be-tween this novel class of total Jensen divergences and the recently introduced total Bregman divergences.We then proceed by defining the total Jensen centroids as average distortion minimizers, and study theirrobustness performance to outliers. Finally, we prove that the k-means++ initialization that bypassesexplicit centroid computations is good enough in practice to guarantee probabilistically a constantapproximation factor to the optimal k-means clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A generalization of the Jensen divergence: The chord gap divergence

We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties. It follows a generalization of the celebrated statistical Bhattacharyya distance that is frequently met in applications. We report an iterative concave-convex procedure for computing centroids, and analyze the perfo...

متن کامل

Clustering with Bregman Divergences: an Asymptotic Analysis

Clustering, in particular k-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of k-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters k is large. We establish quantization rates and describe the lim...

متن کامل

On Clustering Histograms with k-Means by Using Mixed α-Divergences

Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the α-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retr...

متن کامل

A Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)

Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...

متن کامل

Non-flat Clusteringwhith Alpha-divergences

The scope of the well-known k-means algorithm has been broadly extended with some recent results: first, the kmeans++ initialization method gives some approximation guarantees; second, the Bregman k-means algorithm generalizes the classical algorithm to the large family of Bregman divergences. The Bregman seeding framework combines approximation guarantees with Bregman divergences. We present h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1309.7109  شماره 

صفحات  -

تاریخ انتشار 2013